| Algorithm | graph-tool (16 threads) | graph-tool (1 thread) | igraph | NetworkX |
|---|---|---|---|---|
| Single-source shortest path | 0.0023 s | 0.0022 s | 0.0092 s | 0.25 s |
| Global clustering | 0.011 s | 0.025 s | 0.027 s | 7.94 s |
| PageRank | 0.0052 s | 0.022 s | 0.072 s | 1.54 s |
| K-core | 0.0033 s | 0.0036 s | 0.0098 s | 0.72 s |
| Minimum spanning tree | 0.0073 s | 0.0072 s | 0.026 s | 0.64 s |
| Betweenness | 102 s (~1.7 mins) | 331 s (~5.5 mins) | 198 s (vertex) + 439 s (edge) (~ 10.6 mins) | 10297 s (vertex) 13913 s (edge) (~6.7 hours) |
pip install networkx
jupyter notebook
Compared to
docker pull tiagopeixoto/graph-tool
docker run -p 8888:8888 -p 6006:6006 -it -u user -w /home/user -v /home/kfuruglyas:/home/user tiagopeixoto/graph-tool bash
[user@c20b92b8c4bf ~]$ jupyter notebook --ip 0.0.0.0
from graph_tool.all import *
import numpy as np
import matplotlib.pyplot as plt
num_of_nodes, num_of_edges = 9, 18
def create_edges(num_of_nodes, num_of_edges, self_loops = False, multiedges = True, weighted = True):
edges_labels = {}
while len(edges_labels.keys()) < num_of_edges:
e_pair = tuple(np.random.choice(num_of_nodes, 2,self_loops))
if multiedges or (e_pair not in list(edges_labels.keys()) and e_pair[::-1] not in list(edges_labels.keys())):
e_weight = np.random.rand() if weighted else 1
edges_labels[e_pair] = e_weight
return edges_labels
np.random.seed(0)
edges_labels = create_edges(num_of_nodes, num_of_edges)
g = Graph()
g.add_vertex(num_of_nodes);
for i,j in edges_labels.keys():
g.add_edge(i,j)
eprop_wei = g.new_edge_property("double")
eprop_str = g.new_edge_property("string")
for c,key in enumerate(edges_labels.keys()):
e = list(g.edges())[c]
eprop_wei[e] = edges_labels[key]*10
eprop_str[e] = f"{edges_labels[key]:.2f}"
vprop_id = g.new_vertex_property("string")
for i in g.vertices():
vprop_id[i] = str(i)
pos = random_layout(g)
graph_draw(g, pos, vertex_text = vprop_id,
vertex_size=30, edge_pen_width=eprop_wei,
edge_text = eprop_str);
g.set_directed(False)
pos = arf_layout(g)
graph_draw(g, pos, vertex_text = vprop_id,
vertex_size=30, edge_pen_width=eprop_wei, edge_text = eprop_str, );
pos = radial_tree_layout(g, 2)
graph_draw(g, pos, vertex_text = vprop_id,
vertex_size=30, edge_pen_width=eprop_wei, edge_text = eprop_str,
vertex_color = 'green', vertex_shape = "heptagon",vertex_fill_color = 'k',
edge_color = "cyan", edge_font_family= "cursive", edge_mid_marker = "bar");
where description length is
$$ \Sigma = -\ln P\left( \mathbf{A}\left| \right.\mathbf{\theta}, \mathbf{b} \right) - \ln P\left( \mathbf{\theta}, \mathbf{b}\right). $$Measures the amount of information required to describe the data
Example of a nested SBM with three levels.
Goal is to minimize description length [7]
$$ P\left( \mathbf{b}\left| \right. \mathbf{A} \right) = \frac{ \exp\left({-\Sigma}\right)}{P\left( \mathbf{A} \right)}, \text{ where } \Sigma = -\ln P\left( \mathbf{A}\left| \right.\mathbf{\theta}, \mathbf{b} \right) - \ln P\left( \mathbf{\theta}, \mathbf{b}\right). $$Using Markov chain Monte Carlo (MCMC)
minimize_blockmodel_dl()
minimize_nested_blockmodel_dl()
g = collection.data["football"]
print(g)
for k in g.graph_properties.keys():
print(k+":", g.gp[k])
<Graph object, undirected, with 115 vertices and 613 edges, 4 internal vertex properties, 2 internal graph properties, at 0x7fbf94a39100> readme: The file football.gml contains the network of American football games between Division IA colleges during regular season Fall 2000, as compiled by M. Girvan and M. Newman. The nodes have values that indicate to which conferences they belong. The values are as follows: 0 = Atlantic Coast 1 = Big East 2 = Big Ten 3 = Big Twelve 4 = Conference USA 5 = Independents 6 = Mid-American 7 = Mountain West 8 = Pacific Ten 9 = Southeastern 10 = Sun Belt 11 = Western Athletic If you make use of these data, please cite M. Girvan and M. E. J. Newman, Community structure in social and biological networks, Proc. Natl. Acad. Sci. USA 99, 7821-7826 (2002). Correction: Two edges were erroneously duplicated in this data set, and have been removed (21 SEP 2014) description: American College football: network of American football games between Division IA colleges during regular season Fall 2000. Please cite M. Girvan and M. E. J. Newman, Proc. Natl. Acad. Sci. USA 99, 7821-7826 (2002). Retrieved from `Mark Newman's website <http://www-personal.umich.edu/~mejn/netdata/>`_. This file also contains corrections made by T. S. Evans, available `here <http://figshare.com/articles/American_College_Football_Network_Files/93179>`_. Thus, please also cite T.S. Evans, "Clique Graphs and Overlapping Communities", J.Stat.Mech. (2010) P12037 [arXiv:1009.0638].
states = [minimize_blockmodel_dl(g) for _ in range(10)]
entropies = [state.entropy() for state in states]
# minimize_blockmodel_dl(g) returns BlockState object
plt.plot(range(10), entropies, '-o')
plt.xlabel("Trial"), plt.ylabel("Entropy"), plt.grid()
state = states[np.argmin(entropies)]
state.draw(pos=g.vp.pos,vertex_text= g.vp['value']);
b = state.get_blocks()
r = b[10] # group membership of vertex 10
print(r)
3
e = state.get_matrix()
im = plt.imshow(e.todense())
plt.colorbar(im);
g = collection.data["celegansneural"]
print(g)
for k in g.graph_properties.keys():
print(k+":", g.gp[k])
<Graph object, directed, with 297 vertices and 2359 edges, 2 internal vertex properties, 1 internal edge property, 2 internal graph properties, at 0x7fbf42f2cf10> description: Neural network: A directed, weighted network representing the neural network of C. Elegans. Data compiled by D. Watts and S. Strogatz and made available on the web `here <http://cdg.columbia.edu/cdg/datasets>`_. Please cite D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998). Original experimental data taken from J. G. White, E. Southgate, J. N. Thompson, and S. Brenner, Phil. Trans. R. Soc. London 314, 1-340 (1986). Retrieved from `Mark Newman's website <http://www-personal.umich.edu/~mejn/netdata/>`_. readme: Neural network of the nematode C. Elegans Compiled by Duncan Watts and Steven Strogatz from original experimental data by White et al. The file celegansneural.gml describes a weighted, directed network representing the neural network of C. Elegans. The data were taken from the web site of Prof. Duncan Watts at Columbia University, http://cdg.columbia.edu/cdg/datasets. The nodes in the original data were not consecutively numbered, so they have been renumbered to be consecutive. The original node numbers from Watts' data file are retained as the labels of the nodes. Edge weights are the weights given by Watts. These data can be cited as: J. G. White, E. Southgate, J. N. Thompson, and S. Brenner, "The structure of the nervous system of the nematode C. Elegans", Phil. Trans. R. Soc. London 314, 1-340 (1986). D. J. Watts and S. H. Strogatz, "Collective dynamics of `small-world' networks", Nature 393, 440-442 (1998).
states = [minimize_nested_blockmodel_dl(g) for _ in range(10)]
entropies = [state.entropy() for state in states]
# minimize_blockmodel_dl(g) returns BlockState object
plt.plot(range(10), entropies, '-o')
plt.xlabel("Trial"), plt.ylabel("Entropy"), plt.grid()
state = states[np.argmin(entropies)]
state.print_summary()
state.draw();
l: 0, N: 297, B: 20 l: 1, N: 20, B: 6 l: 2, N: 6, B: 3 l: 3, N: 3, B: 1
g = collection.data["celegansneural"]
state = NestedBlockState(g)
state.print_summary()
state.entropy()
l: 0, N: 297, B: 1 l: 1, N: 1, B: 1 l: 2, N: 1, B: 1 l: 3, N: 1, B: 1 l: 4, N: 1, B: 1 l: 5, N: 1, B: 1 l: 6, N: 1, B: 1 l: 7, N: 1, B: 1 l: 8, N: 1, B: 1 l: 9, N: 1, B: 1
9468.527791619807
entropies = []
for _ in range(200): # this should be sufficiently large
state.multiflip_mcmc_sweep(beta=np.inf, niter=10)
entropies.append(state.entropy())
state.print_summary()
plt.semilogy(range(200), entropies)
plt.xlabel("Flip count"), plt.ylabel("Entropy"), plt.grid();
l: 0, N: 297, B: 12 l: 1, N: 12, B: 5 l: 2, N: 5, B: 2 l: 3, N: 2, B: 1 l: 4, N: 1, B: 1 l: 5, N: 1, B: 1 l: 6, N: 1, B: 1 l: 7, N: 1, B: 1 l: 8, N: 1, B: 1 l: 9, N: 1, B: 1
state = NestedBlockState(g)
mcmc_anneal(state, beta_range=(1, 10), niter=500, mcmc_equilibrate_args=dict(force_niter=10))
state.print_summary()
l: 0, N: 297, B: 16 l: 1, N: 16, B: 7 l: 2, N: 7, B: 3 l: 3, N: 3, B: 1 l: 4, N: 1, B: 1 l: 5, N: 1, B: 1 l: 6, N: 1, B: 1 l: 7, N: 1, B: 1 l: 8, N: 1, B: 1 l: 9, N: 1, B: 1
Combinations like minimize_nested_blockmodel_dl $\rightarrow$ mcmc_anneal could give better results.
g = collection.data["celegansneural"]
state_ndc = minimize_nested_blockmodel_dl(g, deg_corr=False)
state_dc = minimize_nested_blockmodel_dl(g, deg_corr=True)
print("Non-degree-corrected DL:\t", state_ndc.entropy())
print("Degree-corrected DL:\t\t", state_dc.entropy())
Non-degree-corrected DL: 8497.380533906504 Degree-corrected DL: 8238.116839291697
g = collection.data["football"]
state_ndc = minimize_nested_blockmodel_dl(g, deg_corr=False)
state_dc = minimize_nested_blockmodel_dl(g, deg_corr=True)
print("Non-degree-corrected DL:\t", state_ndc.entropy())
print("Degree-corrected DL:\t\t", state_dc.entropy())
Non-degree-corrected DL: 1733.7887076547586 Degree-corrected DL: 1780.5767169430787
len(list(g.get_vertices()))
77
g = collection.data["lesmis"]
state = BlockState(g, B=1) # This automatically initializes the state with a partition
# into one group. The user could also pass a higher number
# to start with a random partition of a given size, or pass a
# specific initial partition using the 'b' parameter.
# Now we run 1,000 sweeps of the MCMC. Note that the number of groups
# is allowed to change, so it will eventually move from the initial
# value of B=1 to whatever is most appropriate for the data.
print(f"Initial dl: {state.entropy()}")
dS, nattempts, nmoves = state.multiflip_mcmc_sweep(niter=10000)
print("Change in description length:", dS)
print("Number of vertex moves attempts:", nattempts)
print("Number of accepted vertex moves:", nmoves)
print(f"Final dl: {state.entropy()}")
Initial dl: 787.4999819793873 Change in description length: -77.75071689075722 Number of vertex moves attempts: 3529016 Number of accepted vertex moves: 435709 Final dl: 709.7492650886035
# We will accept equilibration if 10 sweeps are completed without a
# record breaking event, 2 consecutive times.
state = BlockState(g, B=1)
mcmc_equilibrate(state, wait=1000, nbreaks=2, mcmc_args=dict(niter=10))
(710.1487521210951, 10730684, 1330814)
# We will first equilibrate the Markov chain
mcmc_equilibrate(state, wait=1000, mcmc_args=dict(niter=10))
bs = [] # collect some partitions
def collect_partitions(s):
global bs
bs.append(s.b.a.copy())
# Now we collect partitions for exactly 100,000 sweeps, at intervals
# of 10 sweeps:
mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10),
callback=collect_partitions)
# Disambiguate partitions and obtain marginals
pmode = PartitionModeState(bs, converge=True)
pv = pmode.get_marginal(g)
# Now the node marginals are stored in property map pv. We can
# visualize them as pie charts on the nodes:
state.draw(pos=g.vp.pos, vertex_shape="pie", vertex_pie_fractions=pv,)
<VertexPropertyMap object with value type 'vector<double>', for Graph 0x7fbf548fd970, at 0x7fbf58cfa580>
g = collection.data["football"]
state = PPBlockState(g)
# Now we run 1,000 sweeps of the MCMC with zero temperature.
state.multiflip_mcmc_sweep(beta=np.inf, niter=1000)
state.draw(pos=g.vp.pos,vertex_text= g.vp['value'])
state.get_blocks()
<VertexPropertyMap object with value type 'int32_t', for Graph 0x7fbf4158a400, at 0x7fbf417069d0>
MeasuredBlockState $\rightarrow$ MixedMeasuredBlockState $\rightarrow$ UncertainBlockState g = collection.data["lesmis"].copy()
# pretend we have measured and observed each edge twice
n = g.new_ep("int", 2) # number of measurements
x = g.new_ep("int", 2) # number of observations
e = g.edge(11, 36)
x[e] = 1 # pretend we have observed edge (11, 36) only once
e = g.add_edge(15, 73)
n[e] = 2 # pretend we have measured non-edge (15, 73) twice,
x[e] = 1 # but observed it as an edge once.
# We inititialize MeasuredBlockState, assuming that each non-edge has
# been measured only once (as opposed to twice for the observed
# edges), as specified by the 'n_default' and 'x_default' parameters.
state = MeasuredBlockState(g, n=n, n_default=1, x=x, x_default=0)
# We will first equilibrate the Markov chain
mcmc_equilibrate(state, wait=1000, mcmc_args=dict(niter=10))
(1050.8147639731162, 11496984, 2820747)
# Now we collect the marginals for exactly 100,000 sweeps, at
# intervals of 10 sweeps:
u = None # marginal posterior edge probabilities
bs = [] # partitions
cs = [] # average local clustering coefficient
def collect_marginals(s):
global u, bs, cs
u = s.collect_marginal(u)
bstate = s.get_block_state()
bs.append(bstate.levels[0].b.a.copy())
cs.append(local_clustering(s.get_graph()).fa.mean())
mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10),
callback=collect_marginals)
eprob = u.ep.eprob
print("Posterior probability of edge (11, 36):", eprob[u.edge(11, 36)])
print("Posterior probability of non-edge (15, 73):", eprob[u.edge(15, 73)])
print("Estimated average local clustering: %g ± %g" % (np.mean(cs), np.std(cs)))
Posterior probability of edge (11, 36): 0.5445544554455446 Posterior probability of non-edge (15, 73): 0.0175017501750175 Estimated average local clustering: 0.571459 ± 0.00325141
# The maximum marginal posterior estimator can be obtained by
# filtering the edges with probability larger than .5
u = GraphView(u, efilt=u.ep.eprob.fa > .5)
# Mark the recovered true edges as red, and the removed spurious edges as green
ecolor = u.new_ep("vector<double>", val=[0, 0, 0, .6])
for e in u.edges():
if g.edge(e.source(), e.target()) is None or (e.source(), e.target()) == (11, 36):
ecolor[e] = [1, 0, 0, .6]
for e in g.edges():
if u.edge(e.source(), e.target()) is None:
ne = u.add_edge(e.source(), e.target())
ecolor[ne] = [0, 1, 0, .6]
# Duplicate the internal block state with the reconstructed network
# u, for visualization purposes.
bstate = state.get_block_state()
bstate = bstate.levels[0].copy(g=u)
# Disambiguate partitions and obtain marginals
pmode = PartitionModeState(bs, converge=True)
pv = pmode.get_marginal(u)
edash = u.new_ep("vector<double>")
edash[u.edge(15, 73)] = [.1, .1, 0]
bstate.draw(pos=u.own_property(g.vp.pos), vertex_shape="pie", vertex_pie_fractions=pv,
edge_color=ecolor, edge_dash_style=edash, edge_gradient=None,)
<VertexPropertyMap object with value type 'vector<double>', for Graph 0x7fbf4157b700, at 0x7fbf41c32d90>
get_edges_prob
Two non-existing edges in the football network (in red):
g = collection.data["football"]
missing_edges = [(101, 102), (17, 56)]
L = 10
state = minimize_nested_blockmodel_dl(g, deg_corr=True)
bs = state.get_bs() # Get hierarchical partition.
bs += [np.zeros(1)] * (L - len(bs)) # Augment it to L = 10 with
# single-group levels.
state = state.copy(bs=bs, sampling=True)
probs = ([], [])
def collect_edge_probs(s):
p1 = s.get_edges_prob([missing_edges[0]], entropy_args=dict(partition_dl=False))
p2 = s.get_edges_prob([missing_edges[1]], entropy_args=dict(partition_dl=False))
probs[0].append(p1)
probs[1].append(p2)
# Now we collect the probabilities for exactly 100,000 sweeps
mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10),
callback=collect_edge_probs)
def get_avg(p):
p = np.array(p)
pmax = p.max()
p -= pmax
return pmax + np.log(np.exp(p).mean())
p1 = get_avg(probs[0])
p2 = get_avg(probs[1])
p_sum = get_avg([p1, p2]) + np.log(2)
l1 = p1 - p_sum
l2 = p2 - p_sum
print("likelihood-ratio for %s: %g" % (missing_edges[0], np.exp(l1)))
print("likelihood-ratio for %s: %g" % (missing_edges[1], np.exp(l2)))
likelihood-ratio for (101, 102): 0.36719 likelihood-ratio for (17, 56): 0.63281